Dataset statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Number of variables | 15 | 15 |
| Number of observations | 14000 | 14000 |
| Missing cells | 0 | 6 |
| Missing cells (%) | 0.0% | < 0.1% |
| Duplicate rows | 7 | 4 |
| Duplicate rows (%) | 0.1% | < 0.1% |
| Total size in memory | 1.6 MiB | 1.6 MiB |
| Average record size in memory | 120.0 B | 120.0 B |
Variable types
| Original Data | Synthetic Data | |
|---|---|---|
| Numeric | 6 | 5 |
| Categorical | 9 | 10 |
| Original Data | Synthetic Data | |
|---|---|---|
| Dataset has 7 (0.1%) duplicate rows | Dataset has 4 (< 0.1%) duplicate rows | Duplicates |
education_num is highly overall correlated with education | education_num is highly overall correlated with education | High Correlation |
education is highly overall correlated with education_num | education is highly overall correlated with education_num | High Correlation |
relationship is highly overall correlated with gender | relationship is highly overall correlated with gender | High Correlation |
gender is highly overall correlated with relationship | gender is highly overall correlated with relationship | High Correlation |
race is highly imbalanced (65.3%) | race is highly imbalanced (75.2%) | Imbalance |
native_country is highly imbalanced (82.5%) | native_country is highly imbalanced (84.9%) | Imbalance |
capital_gain has 12811 (91.5%) zeros | Alert not present in | Zeros |
capital_loss has 13354 (95.4%) zeros | capital_loss has 13659 (97.6%) zeros | Zeros |
| Alert not present in | capital_gain has a high cardinality: 108 distinct values | High Cardinality |
| Alert not present in | capital_gain is highly imbalanced (90.5%) | Imbalance |
| Alert not present in | fnlwgt is highly skewed (γ1 = 33.69087698) | Skewed |
| Alert not present in | capital_loss is highly skewed (γ1 = 30.03470007) | Skewed |
Reproduction
| Original Data | Synthetic Data | |
|---|---|---|
| Analysis started | 2023-01-21 11:11:05.341314 | 2023-01-21 11:11:15.184928 |
| Analysis finished | 2023-01-21 11:11:15.161551 | 2023-01-21 11:11:22.636424 |
| Duration | 9.82 seconds | 7.45 seconds |
| Software version | pandas-profiling vv3.6.2 | pandas-profiling vv3.6.2 |
| Download configuration | config.json | config.json |
age
Real number (ℝ)
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 72 | 69 |
| Distinct (%) | 0.5% | 0.5% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 38.492714 | 38.523286 |
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 17 | 1 |
| Maximum | 90 | 90 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
Quantile statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 17 | 1 |
| 5-th percentile | 20 | 20 |
| Q1 | 27 | 27 |
| median | 37 | 38 |
| Q3 | 47 | 48 |
| 95-th percentile | 63 | 62 |
| Maximum | 90 | 90 |
| Range | 73 | 89 |
| Interquartile range (IQR) | 20 | 21 |
Descriptive statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Standard deviation | 13.684022 | 13.439151 |
| Coefficient of variation (CV) | 0.35549642 | 0.34885787 |
| Kurtosis | -0.098493391 | -0.30808631 |
| Mean | 38.492714 | 38.523286 |
| Median Absolute Deviation (MAD) | 10 | 10 |
| Skewness | 0.58416581 | 0.4923188 |
| Sum | 538898 | 539326 |
| Variance | 187.25246 | 180.61079 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 36 | 381 | 2.7% |
| 23 | 381 | 2.7% |
| 30 | 379 | 2.7% |
| 34 | 373 | 2.7% |
| 31 | 372 | 2.7% |
| 28 | 369 | 2.6% |
| 41 | 368 | 2.6% |
| 37 | 367 | 2.6% |
| 24 | 366 | 2.6% |
| 25 | 364 | 2.6% |
| Other values (62) | 10280 |
| Value | Count | Frequency (%) |
| 28 | 515 | 3.7% |
| 25 | 479 | 3.4% |
| 24 | 466 | 3.3% |
| 40 | 449 | 3.2% |
| 41 | 440 | 3.1% |
| 48 | 421 | 3.0% |
| 45 | 412 | 2.9% |
| 21 | 406 | 2.9% |
| 23 | 398 | 2.8% |
| 46 | 397 | 2.8% |
| Other values (59) | 9617 |
| Value | Count | Frequency (%) |
| 17 | 184 | |
| 18 | 237 | |
| 19 | 273 | |
| 20 | 341 | |
| 21 | 326 | |
| 22 | 341 | |
| 23 | 381 | |
| 24 | 366 | |
| 25 | 364 | |
| 26 | 334 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 17 | 134 | 1.0% |
| 18 | 206 | |
| 19 | 219 | |
| 20 | 360 | |
| 21 | 406 | |
| 22 | 262 | |
| 23 | 398 | |
| 24 | 466 | |
| 25 | 479 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 17 | 134 | 1.0% |
| 18 | 206 | |
| 19 | 219 | |
| 20 | 360 | |
| 21 | 406 | |
| 22 | 262 | |
| 23 | 398 | |
| 24 | 466 | |
| 25 | 479 |
| Value | Count | Frequency (%) |
| 17 | 184 | |
| 18 | 237 | |
| 19 | 273 | |
| 20 | 341 | |
| 21 | 326 | |
| 22 | 341 | |
| 23 | 381 | |
| 24 | 366 | |
| 25 | 364 | |
| 26 | 334 |
workclass
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 8 | 8 |
| Distinct (%) | 0.1% | 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| Private | |
|---|---|
| Self-emp-not-inc | |
| Local-gov | 887 |
| ? | 810 |
| State-gov | 555 |
| Other values (3) | 889 |
| Private | |
|---|---|
| ? | 884 |
| Self-emp-not-inc | 857 |
| Local-gov | 759 |
| Self-emp-inc | 422 |
| Other values (3) | 820 |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 16 | 16 |
| Median length | 7 | 7 |
| Mean length | 7.8645714 | 7.6055 |
| Min length | 1 | 1 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 110104 | 106477 |
| Distinct characters | 25 | 25 |
| Distinct categories | 4 | 4 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | ? | ? |
| 2nd row | Private | Private |
| 3rd row | Private | Private |
| 4th row | Private | Private |
| 5th row | ? | Private |
Common Values
| Value | Count | Frequency (%) |
| Private | 9743 | |
| Self-emp-not-inc | 1116 | 8.0% |
| Local-gov | 887 | 6.3% |
| ? | 810 | 5.8% |
| State-gov | 555 | 4.0% |
| Self-emp-inc | 480 | 3.4% |
| Federal-gov | 399 | 2.9% |
| Without-pay | 10 | 0.1% |
| Value | Count | Frequency (%) |
| Private | 10258 | |
| ? | 884 | 6.3% |
| Self-emp-not-inc | 857 | 6.1% |
| Local-gov | 759 | 5.4% |
| Self-emp-inc | 422 | 3.0% |
| State-gov | 420 | 3.0% |
| Federal-gov | 396 | 2.8% |
| Without-pay | 4 | < 0.1% |
Length
Common Values (Plot)
Original Data
Synthetic Data
| Value | Count | Frequency (%) |
| private | 9743 | |
| self-emp-not-inc | 1116 | 8.0% |
| local-gov | 887 | 6.3% |
| 810 | 5.8% | |
| state-gov | 555 | 4.0% |
| self-emp-inc | 480 | 3.4% |
| federal-gov | 399 | 2.9% |
| without-pay | 10 | 0.1% |
| Value | Count | Frequency (%) |
| private | 10258 | |
| 884 | 6.3% | |
| self-emp-not-inc | 857 | 6.1% |
| local-gov | 759 | 5.4% |
| self-emp-inc | 422 | 3.0% |
| state-gov | 420 | 3.0% |
| federal-gov | 396 | 2.8% |
| without-pay | 4 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 14288 | |
| t | 11989 | |
| a | 11594 | |
| v | 11584 | |
| i | 11349 | |
| r | 10142 | |
| P | 9743 | |
| - | 6159 | 5.6% |
| o | 3854 | 3.5% |
| l | 2882 | 2.6% |
| Other values (15) | 16520 |
| Value | Count | Frequency (%) |
| e | 14028 | |
| t | 11963 | |
| a | 11837 | |
| v | 11833 | |
| i | 11541 | |
| r | 10654 | |
| P | 10258 | |
| - | 4994 | 4.7% |
| o | 3195 | 3.0% |
| l | 2434 | 2.3% |
| Other values (15) | 13740 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 89945 | |
| Uppercase Letter | 13190 | 12.0% |
| Dash Punctuation | 6159 | 5.6% |
| Other Punctuation | 810 | 0.7% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 87483 | |
| Uppercase Letter | 13116 | 12.3% |
| Dash Punctuation | 4994 | 4.7% |
| Other Punctuation | 884 | 0.8% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 14288 | |
| t | 11989 | |
| a | 11594 | |
| v | 11584 | |
| i | 11349 | |
| r | 10142 | |
| o | 3854 | 4.3% |
| l | 2882 | 3.2% |
| n | 2712 | 3.0% |
| c | 2483 | 2.8% |
| Other values (8) | 7068 |
| Value | Count | Frequency (%) |
| e | 14028 | |
| t | 11963 | |
| a | 11837 | |
| v | 11833 | |
| i | 11541 | |
| r | 10654 | |
| o | 3195 | 3.7% |
| l | 2434 | 2.8% |
| n | 2136 | 2.4% |
| c | 2038 | 2.3% |
| Other values (8) | 5824 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 9743 | |
| S | 2151 | 16.3% |
| L | 887 | 6.7% |
| F | 399 | 3.0% |
| W | 10 | 0.1% |
| Value | Count | Frequency (%) |
| P | 10258 | |
| S | 1699 | 13.0% |
| L | 759 | 5.8% |
| F | 396 | 3.0% |
| W | 4 | < 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 6159 |
| Value | Count | Frequency (%) |
| - | 4994 |
Other Punctuation
| Value | Count | Frequency (%) |
| ? | 810 |
| Value | Count | Frequency (%) |
| ? | 884 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 103135 | |
| Common | 6969 | 6.3% |
| Value | Count | Frequency (%) |
| Latin | 100599 | |
| Common | 5878 | 5.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 14288 | |
| t | 11989 | |
| a | 11594 | |
| v | 11584 | |
| i | 11349 | |
| r | 10142 | |
| P | 9743 | |
| o | 3854 | 3.7% |
| l | 2882 | 2.8% |
| n | 2712 | 2.6% |
| Other values (13) | 12998 |
| Value | Count | Frequency (%) |
| e | 14028 | |
| t | 11963 | |
| a | 11837 | |
| v | 11833 | |
| i | 11541 | |
| r | 10654 | |
| P | 10258 | |
| o | 3195 | 3.2% |
| l | 2434 | 2.4% |
| n | 2136 | 2.1% |
| Other values (13) | 10720 |
Common
| Value | Count | Frequency (%) |
| - | 6159 | |
| ? | 810 | 11.6% |
| Value | Count | Frequency (%) |
| - | 4994 | |
| ? | 884 | 15.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 110104 |
| Value | Count | Frequency (%) |
| ASCII | 106477 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 14288 | |
| t | 11989 | |
| a | 11594 | |
| v | 11584 | |
| i | 11349 | |
| r | 10142 | |
| P | 9743 | |
| - | 6159 | 5.6% |
| o | 3854 | 3.5% |
| l | 2882 | 2.6% |
| Other values (15) | 16520 |
| Value | Count | Frequency (%) |
| e | 14028 | |
| t | 11963 | |
| a | 11837 | |
| v | 11833 | |
| i | 11541 | |
| r | 10654 | |
| P | 10258 | |
| - | 4994 | 4.7% |
| o | 3195 | 3.0% |
| l | 2434 | 2.3% |
| Other values (15) | 13740 |
fnlwgt
Real number (ℝ)
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 11258 | 7359 |
| Distinct (%) | 80.4% | 52.6% |
| Missing | 0 | 4 |
| Missing (%) | 0.0% | < 0.1% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 189421.81 | 193648.07 |
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 12285 | 4 |
| Maximum | 1484705 | 15129410 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
Quantile statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 12285 | 4 |
| 5-th percentile | 39235.35 | 38772 |
| Q1 | 118095.5 | 116632 |
| median | 179533 | 178882.5 |
| Q3 | 236858.75 | 234460 |
| 95-th percentile | 377759.05 | 377930.25 |
| Maximum | 1484705 | 15129410 |
| Range | 1472420 | 15129406 |
| Interquartile range (IQR) | 118763.25 | 117828 |
Descriptive statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Standard deviation | 104509.84 | 203944.58 |
| Coefficient of variation (CV) | 0.55173077 | 1.0531712 |
| Kurtosis | 6.02804 | 2147.0835 |
| Mean | 189421.81 | 193648.07 |
| Median Absolute Deviation (MAD) | 59954.5 | 59719.5 |
| Skewness | 1.3977123 | 33.690877 |
| Sum | 2.6519054 × 109 | 2.7102984 × 109 |
| Variance | 1.0922307 × 1010 | 4.159339 × 1010 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 177675 | 8 | 0.1% |
| 113364 | 8 | 0.1% |
| 125461 | 7 | 0.1% |
| 190290 | 7 | 0.1% |
| 193882 | 7 | 0.1% |
| 126675 | 7 | 0.1% |
| 194901 | 7 | 0.1% |
| 121124 | 7 | 0.1% |
| 117963 | 7 | 0.1% |
| 143582 | 6 | < 0.1% |
| Other values (11248) | 13929 |
| Value | Count | Frequency (%) |
| 148995 | 21 | 0.1% |
| 104196 | 19 | 0.1% |
| 32732 | 18 | 0.1% |
| 190290 | 18 | 0.1% |
| 143582 | 17 | 0.1% |
| 99185 | 17 | 0.1% |
| 417668 | 17 | 0.1% |
| 272944 | 16 | 0.1% |
| 144778 | 16 | 0.1% |
| 193882 | 15 | 0.1% |
| Other values (7349) | 13822 |
| Value | Count | Frequency (%) |
| 12285 | 1 | < 0.1% |
| 14878 | 1 | < 0.1% |
| 19214 | 1 | < 0.1% |
| 19302 | 3 | |
| 19395 | 2 | |
| 19410 | 1 | < 0.1% |
| 19700 | 1 | < 0.1% |
| 19752 | 1 | < 0.1% |
| 19793 | 1 | < 0.1% |
| 19899 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 4 | 1 | |
| 108 | 1 | |
| 828 | 1 | |
| 2013 | 1 | |
| 3125 | 1 | |
| 3413 | 1 | |
| 3487 | 1 | |
| 3788 | 1 | |
| 3908 | 1 | |
| 3911 | 1 |
| Value | Count | Frequency (%) |
| 4 | 1 | |
| 108 | 1 | |
| 828 | 1 | |
| 2013 | 1 | |
| 3125 | 1 | |
| 3413 | 1 | |
| 3487 | 1 | |
| 3788 | 1 | |
| 3908 | 1 | |
| 3911 | 1 |
| Value | Count | Frequency (%) |
| 12285 | 1 | < 0.1% |
| 14878 | 1 | < 0.1% |
| 19214 | 1 | < 0.1% |
| 19302 | 3 | |
| 19395 | 2 | |
| 19410 | 1 | < 0.1% |
| 19700 | 1 | < 0.1% |
| 19752 | 1 | < 0.1% |
| 19793 | 1 | < 0.1% |
| 19899 | 1 | < 0.1% |
education
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 16 | 16 |
| Distinct (%) | 0.1% | 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| HS-grad | |
|---|---|
| Some-college | |
| Bachelors | |
| Masters | |
| Assoc-voc | |
| Other values (11) |
| HS-grad | |
|---|---|
| Some-college | |
| Bachelors | |
| Assoc-voc | |
| Masters | |
| Other values (11) |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 12 | 12 |
| Median length | 11 | 11 |
| Mean length | 8.439 | 8.4553571 |
| Min length | 3 | 3 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 118146 | 118375 |
| Distinct characters | 31 | 31 |
| Distinct categories | 4 | 4 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | 11th | 11th |
| 2nd row | Doctorate | 10th |
| 3rd row | Bachelors | 9th |
| 4th row | Assoc-acdm | 5th-6th |
| 5th row | Some-college | HS-grad |
Common Values
| Value | Count | Frequency (%) |
| HS-grad | 4452 | |
| Some-college | 3163 | |
| Bachelors | 2319 | |
| Masters | 734 | 5.2% |
| Assoc-voc | 624 | 4.5% |
| 11th | 528 | 3.8% |
| Assoc-acdm | 452 | 3.2% |
| 10th | 408 | 2.9% |
| 7th-8th | 297 | 2.1% |
| Prof-school | 233 | 1.7% |
| Other values (6) | 790 | 5.6% |
| Value | Count | Frequency (%) |
| HS-grad | 4650 | |
| Some-college | 3020 | |
| Bachelors | 2522 | |
| Assoc-voc | 707 | 5.1% |
| Masters | 650 | 4.6% |
| 11th | 470 | 3.4% |
| Assoc-acdm | 407 | 2.9% |
| 10th | 355 | 2.5% |
| 7th-8th | 299 | 2.1% |
| Prof-school | 202 | 1.4% |
| Other values (6) | 718 | 5.1% |
Length
Common Values (Plot)
Original Data
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Synthetic Data
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| hs-grad | 4452 | |
| some-college | 3163 | |
| bachelors | 2319 | |
| masters | 734 | 5.2% |
| assoc-voc | 624 | 4.5% |
| 11th | 528 | 3.8% |
| assoc-acdm | 452 | 3.2% |
| 10th | 408 | 2.9% |
| 7th-8th | 297 | 2.1% |
| prof-school | 233 | 1.7% |
| Other values (6) | 790 | 5.6% |
| Value | Count | Frequency (%) |
| hs-grad | 4650 | |
| some-college | 3020 | |
| bachelors | 2522 | |
| assoc-voc | 707 | 5.1% |
| masters | 650 | 4.6% |
| 11th | 470 | 3.4% |
| assoc-acdm | 407 | 2.9% |
| 10th | 355 | 2.5% |
| 7th-8th | 299 | 2.1% |
| prof-school | 202 | 1.4% |
| Other values (6) | 718 | 5.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 12723 | |
| o | 11406 | 9.7% |
| - | 9434 | 8.0% |
| l | 8894 | 7.5% |
| a | 8122 | 6.9% |
| c | 8048 | 6.8% |
| r | 7919 | 6.7% |
| g | 7615 | 6.4% |
| S | 7615 | 6.4% |
| s | 6256 | 5.3% |
| Other values (21) | 30114 |
| Value | Count | Frequency (%) |
| e | 12421 | |
| o | 11367 | 9.6% |
| - | 9495 | 8.0% |
| l | 8783 | 7.4% |
| a | 8399 | 7.1% |
| r | 8213 | 6.9% |
| c | 8161 | 6.9% |
| g | 7670 | 6.5% |
| S | 7670 | 6.5% |
| s | 6329 | 5.3% |
| Other values (21) | 29867 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 88627 | |
| Uppercase Letter | 16610 | 14.1% |
| Dash Punctuation | 9434 | 8.0% |
| Decimal Number | 3475 | 2.9% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 88735 | |
| Uppercase Letter | 16997 | 14.4% |
| Dash Punctuation | 9495 | 8.0% |
| Decimal Number | 3148 | 2.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 12723 | |
| o | 11406 | |
| l | 8894 | |
| a | 8122 | |
| c | 8048 | |
| r | 7919 | |
| g | 7615 | |
| s | 6256 | |
| d | 4904 | 5.5% |
| h | 4852 | 5.5% |
| Other values (4) | 7888 |
| Value | Count | Frequency (%) |
| e | 12421 | |
| o | 11367 | |
| l | 8783 | |
| a | 8399 | |
| r | 8213 | |
| c | 8161 | |
| g | 7670 | |
| s | 6329 | |
| d | 5057 | |
| h | 4847 | 5.5% |
| Other values (4) | 7488 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 9434 |
| Value | Count | Frequency (%) |
| - | 9495 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 7615 | |
| H | 4452 | |
| B | 2319 | 14.0% |
| A | 1076 | 6.5% |
| M | 734 | 4.4% |
| P | 249 | 1.5% |
| D | 165 | 1.0% |
| Value | Count | Frequency (%) |
| S | 7670 | |
| H | 4650 | |
| B | 2522 | 14.8% |
| A | 1114 | 6.6% |
| M | 650 | 3.8% |
| P | 221 | 1.3% |
| D | 170 | 1.0% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 1719 | |
| 0 | 408 | 11.7% |
| 7 | 297 | 8.5% |
| 8 | 297 | 8.5% |
| 9 | 209 | 6.0% |
| 2 | 187 | 5.4% |
| 5 | 145 | 4.2% |
| 6 | 145 | 4.2% |
| 4 | 68 | 2.0% |
| Value | Count | Frequency (%) |
| 1 | 1514 | |
| 0 | 355 | 11.3% |
| 7 | 299 | 9.5% |
| 8 | 299 | 9.5% |
| 2 | 161 | 5.1% |
| 9 | 158 | 5.0% |
| 5 | 152 | 4.8% |
| 6 | 152 | 4.8% |
| 4 | 58 | 1.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 105237 | |
| Common | 12909 | 10.9% |
| Value | Count | Frequency (%) |
| Latin | 105732 | |
| Common | 12643 | 10.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 12723 | |
| o | 11406 | |
| l | 8894 | |
| a | 8122 | 7.7% |
| c | 8048 | 7.6% |
| r | 7919 | 7.5% |
| g | 7615 | 7.2% |
| S | 7615 | 7.2% |
| s | 6256 | 5.9% |
| d | 4904 | 4.7% |
| Other values (11) | 21735 |
| Value | Count | Frequency (%) |
| e | 12421 | |
| o | 11367 | |
| l | 8783 | |
| a | 8399 | 7.9% |
| r | 8213 | 7.8% |
| c | 8161 | 7.7% |
| g | 7670 | 7.3% |
| S | 7670 | 7.3% |
| s | 6329 | 6.0% |
| d | 5057 | 4.8% |
| Other values (11) | 21662 |
Common
| Value | Count | Frequency (%) |
| - | 9434 | |
| 1 | 1719 | 13.3% |
| 0 | 408 | 3.2% |
| 7 | 297 | 2.3% |
| 8 | 297 | 2.3% |
| 9 | 209 | 1.6% |
| 2 | 187 | 1.4% |
| 5 | 145 | 1.1% |
| 6 | 145 | 1.1% |
| 4 | 68 | 0.5% |
| Value | Count | Frequency (%) |
| - | 9495 | |
| 1 | 1514 | 12.0% |
| 0 | 355 | 2.8% |
| 7 | 299 | 2.4% |
| 8 | 299 | 2.4% |
| 2 | 161 | 1.3% |
| 9 | 158 | 1.2% |
| 5 | 152 | 1.2% |
| 6 | 152 | 1.2% |
| 4 | 58 | 0.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 118146 |
| Value | Count | Frequency (%) |
| ASCII | 118375 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 12723 | |
| o | 11406 | 9.7% |
| - | 9434 | 8.0% |
| l | 8894 | 7.5% |
| a | 8122 | 6.9% |
| c | 8048 | 6.8% |
| r | 7919 | 6.7% |
| g | 7615 | 6.4% |
| S | 7615 | 6.4% |
| s | 6256 | 5.3% |
| Other values (21) | 30114 |
| Value | Count | Frequency (%) |
| e | 12421 | |
| o | 11367 | 9.6% |
| - | 9495 | 8.0% |
| l | 8783 | 7.4% |
| a | 8399 | 7.1% |
| r | 8213 | 6.9% |
| c | 8161 | 6.9% |
| g | 7670 | 6.5% |
| S | 7670 | 6.5% |
| s | 6329 | 5.3% |
| Other values (21) | 29867 |
education_num
Real number (ℝ)
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 16 | 16 |
| Distinct (%) | 0.1% | 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 10.071714 | 10.116571 |
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 16 | 16 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
Quantile statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 5 | 6 |
| Q1 | 9 | 9 |
| median | 10 | 10 |
| Q3 | 12 | 13 |
| 95-th percentile | 14 | 14 |
| Maximum | 16 | 16 |
| Range | 15 | 15 |
| Interquartile range (IQR) | 3 | 4 |
Descriptive statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Standard deviation | 2.5620383 | 2.521849 |
| Coefficient of variation (CV) | 0.25437956 | 0.24927902 |
| Kurtosis | 0.58744484 | 0.71457805 |
| Mean | 10.071714 | 10.116571 |
| Median Absolute Deviation (MAD) | 1 | 1 |
| Skewness | -0.31432088 | -0.34253642 |
| Sum | 141004 | 141632 |
| Variance | 6.5640402 | 6.3597225 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 9 | 4452 | |
| 10 | 3163 | |
| 13 | 2319 | |
| 14 | 734 | 5.2% |
| 11 | 624 | 4.5% |
| 7 | 528 | 3.8% |
| 12 | 452 | 3.2% |
| 6 | 408 | 2.9% |
| 4 | 297 | 2.1% |
| 15 | 233 | 1.7% |
| Other values (6) | 790 | 5.6% |
| Value | Count | Frequency (%) |
| 9 | 4650 | |
| 10 | 3020 | |
| 13 | 2522 | |
| 11 | 707 | 5.1% |
| 14 | 650 | 4.6% |
| 7 | 470 | 3.4% |
| 12 | 407 | 2.9% |
| 6 | 355 | 2.5% |
| 4 | 299 | 2.1% |
| 15 | 202 | 1.4% |
| Other values (6) | 718 | 5.1% |
| Value | Count | Frequency (%) |
| 1 | 16 | 0.1% |
| 2 | 68 | 0.5% |
| 3 | 145 | 1.0% |
| 4 | 297 | 2.1% |
| 5 | 209 | 1.5% |
| 6 | 408 | 2.9% |
| 7 | 528 | 3.8% |
| 8 | 187 | 1.3% |
| 9 | 4452 | |
| 10 | 3163 |
| Value | Count | Frequency (%) |
| 1 | 19 | 0.1% |
| 2 | 58 | 0.4% |
| 3 | 152 | 1.1% |
| 4 | 299 | 2.1% |
| 5 | 158 | 1.1% |
| 6 | 355 | 2.5% |
| 7 | 470 | 3.4% |
| 8 | 161 | 1.1% |
| 9 | 4650 | |
| 10 | 3020 |
| Value | Count | Frequency (%) |
| 1 | 19 | 0.1% |
| 2 | 58 | 0.4% |
| 3 | 152 | 1.1% |
| 4 | 299 | 2.1% |
| 5 | 158 | 1.1% |
| 6 | 355 | 2.5% |
| 7 | 470 | 3.4% |
| 8 | 161 | 1.1% |
| 9 | 4650 | |
| 10 | 3020 |
| Value | Count | Frequency (%) |
| 1 | 16 | 0.1% |
| 2 | 68 | 0.5% |
| 3 | 145 | 1.0% |
| 4 | 297 | 2.1% |
| 5 | 209 | 1.5% |
| 6 | 408 | 2.9% |
| 7 | 528 | 3.8% |
| 8 | 187 | 1.3% |
| 9 | 4452 | |
| 10 | 3163 |
marital_status
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 7 | 9 |
| Distinct (%) | 0.1% | 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| Married-civ-spouse | |
|---|---|
| Never-married | |
| Divorced | |
| Separated | 439 |
| Widowed | 426 |
| Other values (2) | 177 |
| Married-civ-spouse | |
|---|---|
| Never-married | |
| Divorced | |
| Separated | 387 |
| Widowed | 368 |
| Other values (4) | 21 |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 21 | 21 |
| Median length | 18 | 18 |
| Mean length | 14.412071 | 14.258786 |
| Min length | 7 | 7 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 201769 | 199623 |
| Distinct characters | 24 | 24 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 0 | 3 ? |
| Unique (%) | 0.0% | < 0.1% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | Never-married | Married-civ-spouse |
| 2nd row | Divorced | Married-civ-spouse |
| 3rd row | Never-married | Married-civ-spouse |
| 4th row | Divorced | Married-civ-spouse |
| 5th row | Never-married | Never-married |
Common Values
| Value | Count | Frequency (%) |
| Married-civ-spouse | 6417 | |
| Never-married | 4661 | |
| Divorced | 1880 | 13.4% |
| Separated | 439 | 3.1% |
| Widowed | 426 | 3.0% |
| Married-spouse-absent | 172 | 1.2% |
| Married-AF-spouse | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| Married-civ-spouse | 6269 | |
| Never-married | 4932 | |
| Divorced | 2023 | 14.4% |
| Separated | 387 | 2.8% |
| Widowed | 368 | 2.6% |
| Married-spouse-absent | 18 | 0.1% |
| rried-civ-spouse | 1 | < 0.1% |
| -civ-spouse | 1 | < 0.1% |
| Married-AF-spouse | 1 | < 0.1% |
Length
Common Values (Plot)
Original Data
Synthetic Data
| Value | Count | Frequency (%) |
| married-civ-spouse | 6417 | |
| never-married | 4661 | |
| divorced | 1880 | 13.4% |
| separated | 439 | 3.1% |
| widowed | 426 | 3.0% |
| married-spouse-absent | 172 | 1.2% |
| married-af-spouse | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| married-civ-spouse | 6269 | |
| never-married | 4932 | |
| divorced | 2023 | 14.4% |
| separated | 387 | 2.8% |
| widowed | 368 | 2.6% |
| married-spouse-absent | 18 | 0.1% |
| rried-civ-spouse | 1 | < 0.1% |
| civ-spouse | 1 | < 0.1% |
| married-af-spouse | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 30527 | |
| r | 29490 | |
| i | 19978 | |
| - | 17849 | |
| d | 14426 | |
| s | 13360 | 6.6% |
| v | 12958 | 6.4% |
| a | 12305 | 6.1% |
| o | 8900 | 4.4% |
| c | 8297 | 4.1% |
| Other values (14) | 33679 |
| Value | Count | Frequency (%) |
| e | 30558 | |
| r | 29784 | |
| i | 19883 | |
| - | 17512 | |
| d | 14367 | |
| v | 13226 | |
| s | 12598 | 6.3% |
| a | 12012 | 6.0% |
| o | 8681 | 4.3% |
| c | 8294 | 4.2% |
| Other values (14) | 32708 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 169910 | |
| Dash Punctuation | 17849 | 8.8% |
| Uppercase Letter | 14010 | 6.9% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 168111 | |
| Dash Punctuation | 17512 | 8.8% |
| Uppercase Letter | 14000 | 7.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 30527 | |
| r | 29490 | |
| i | 19978 | |
| d | 14426 | |
| s | 13360 | |
| v | 12958 | |
| a | 12305 | |
| o | 8900 | 5.2% |
| c | 8297 | 4.9% |
| p | 7033 | 4.1% |
| Other values (6) | 12636 |
| Value | Count | Frequency (%) |
| e | 30558 | |
| r | 29784 | |
| i | 19883 | |
| d | 14367 | |
| v | 13226 | |
| s | 12598 | |
| a | 12012 | 7.1% |
| o | 8681 | 5.2% |
| c | 8294 | 4.9% |
| p | 6677 | 4.0% |
| Other values (6) | 12031 | 7.2% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 17849 |
| Value | Count | Frequency (%) |
| - | 17512 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 6594 | |
| N | 4661 | |
| D | 1880 | 13.4% |
| S | 439 | 3.1% |
| W | 426 | 3.0% |
| A | 5 | < 0.1% |
| F | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| M | 6288 | |
| N | 4932 | |
| D | 2023 | 14.4% |
| S | 387 | 2.8% |
| W | 368 | 2.6% |
| A | 1 | < 0.1% |
| F | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 183920 | |
| Common | 17849 | 8.8% |
| Value | Count | Frequency (%) |
| Latin | 182111 | |
| Common | 17512 | 8.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 30527 | |
| r | 29490 | |
| i | 19978 | |
| d | 14426 | |
| s | 13360 | |
| v | 12958 | |
| a | 12305 | |
| o | 8900 | 4.8% |
| c | 8297 | 4.5% |
| p | 7033 | 3.8% |
| Other values (13) | 26646 |
| Value | Count | Frequency (%) |
| e | 30558 | |
| r | 29784 | |
| i | 19883 | |
| d | 14367 | |
| v | 13226 | |
| s | 12598 | |
| a | 12012 | 6.6% |
| o | 8681 | 4.8% |
| c | 8294 | 4.6% |
| p | 6677 | 3.7% |
| Other values (13) | 26031 |
Common
| Value | Count | Frequency (%) |
| - | 17849 |
| Value | Count | Frequency (%) |
| - | 17512 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 201769 |
| Value | Count | Frequency (%) |
| ASCII | 199623 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 30527 | |
| r | 29490 | |
| i | 19978 | |
| - | 17849 | |
| d | 14426 | |
| s | 13360 | 6.6% |
| v | 12958 | 6.4% |
| a | 12305 | 6.1% |
| o | 8900 | 4.4% |
| c | 8297 | 4.1% |
| Other values (14) | 33679 |
| Value | Count | Frequency (%) |
| e | 30558 | |
| r | 29784 | |
| i | 19883 | |
| - | 17512 | |
| d | 14367 | |
| v | 13226 | |
| s | 12598 | 6.3% |
| a | 12012 | 6.0% |
| o | 8681 | 4.3% |
| c | 8294 | 4.2% |
| Other values (14) | 32708 |
occupation
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 15 | 19 |
| Distinct (%) | 0.1% | 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| Craft-repair | |
|---|---|
| Exec-managerial | |
| Prof-specialty | |
| Adm-clerical | |
| Sales | |
| Other values (10) |
| Adm-clerical | |
|---|---|
| Craft-repair | |
| Prof-specialty | |
| Exec-managerial | |
| Other-service | |
| Other values (14) |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 17 | 17 |
| Median length | 15 | 15 |
| Mean length | 12.188 | 12.230929 |
| Min length | 1 | 1 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 170632 | 171233 |
| Distinct characters | 32 | 32 |
| Distinct categories | 4 | 4 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 0 | 4 ? |
| Unique (%) | 0.0% | < 0.1% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | ? | ? |
| 2nd row | Sales | ? |
| 3rd row | Craft-repair | ? |
| 4th row | Sales | Sales |
| 5th row | ? | ? |
Common Values
| Value | Count | Frequency (%) |
| Craft-repair | 1766 | |
| Exec-managerial | 1753 | |
| Prof-specialty | 1749 | |
| Adm-clerical | 1620 | |
| Sales | 1575 | |
| Other-service | 1416 | |
| Machine-op-inspct | 875 | |
| ? | 810 | |
| Transport-moving | 694 | 5.0% |
| Handlers-cleaners | 580 | 4.1% |
| Other values (5) | 1162 |
| Value | Count | Frequency (%) |
| Adm-clerical | 2310 | |
| Craft-repair | 2161 | |
| Prof-specialty | 1780 | |
| Exec-managerial | 1598 | |
| Other-service | 1277 | |
| Sales | 1244 | |
| Transport-moving | 884 | 6.3% |
| ? | 765 | 5.5% |
| Machine-op-inspct | 602 | 4.3% |
| Handlers-cleaners | 473 | 3.4% |
| Other values (9) | 906 | 6.5% |
Length
Common Values (Plot)
Original Data
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Synthetic Data
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| craft-repair | 1766 | |
| exec-managerial | 1753 | |
| prof-specialty | 1749 | |
| adm-clerical | 1620 | |
| sales | 1575 | |
| other-service | 1416 | |
| machine-op-inspct | 875 | |
| 810 | ||
| transport-moving | 694 | 5.0% |
| handlers-cleaners | 580 | 4.1% |
| Other values (5) | 1162 |
| Value | Count | Frequency (%) |
| adm-clerical | 2310 | |
| craft-repair | 2161 | |
| prof-specialty | 1780 | |
| exec-managerial | 1598 | |
| other-service | 1277 | |
| sales | 1244 | |
| transport-moving | 884 | 6.3% |
| 765 | 5.5% | |
| machine-op-inspct | 602 | 4.3% |
| handlers-cleaners | 473 | 3.4% |
| Other values (9) | 906 | 6.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 18492 | 10.8% |
| r | 17334 | 10.2% |
| a | 16877 | 9.9% |
| - | 12566 | 7.4% |
| i | 12355 | 7.2% |
| c | 11161 | 6.5% |
| l | 9477 | 5.6% |
| s | 8707 | 5.1% |
| t | 7461 | 4.4% |
| n | 6877 | 4.0% |
| Other values (22) | 49325 |
| Value | Count | Frequency (%) |
| r | 18533 | 10.8% |
| e | 17354 | 10.1% |
| a | 17294 | 10.1% |
| - | 12684 | 7.4% |
| i | 12654 | 7.4% |
| c | 11363 | 6.6% |
| l | 10190 | 6.0% |
| s | 7722 | 4.5% |
| t | 7230 | 4.2% |
| p | 6617 | 3.9% |
| Other values (22) | 49592 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 144062 | |
| Uppercase Letter | 13194 | 7.7% |
| Dash Punctuation | 12566 | 7.4% |
| Other Punctuation | 810 | 0.5% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 144550 | |
| Uppercase Letter | 13234 | 7.7% |
| Dash Punctuation | 12684 | 7.4% |
| Other Punctuation | 765 | 0.4% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 18492 | |
| r | 17334 | |
| a | 16877 | |
| i | 12355 | |
| c | 11161 | 7.7% |
| l | 9477 | 6.6% |
| s | 8707 | 6.0% |
| t | 7461 | 5.2% |
| n | 6877 | 4.8% |
| p | 6713 | 4.7% |
| Other values (10) | 28608 |
| Value | Count | Frequency (%) |
| r | 18533 | |
| e | 17354 | |
| a | 17294 | |
| i | 12654 | |
| c | 11363 | 7.9% |
| l | 10190 | 7.0% |
| s | 7722 | 5.3% |
| t | 7230 | 5.0% |
| p | 6617 | 4.6% |
| n | 6345 | 4.4% |
| Other values (10) | 29248 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 12566 |
| Value | Count | Frequency (%) |
| - | 12684 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 2117 | |
| C | 1766 | |
| E | 1753 | |
| A | 1624 | |
| S | 1575 | |
| O | 1416 | |
| T | 1071 | |
| M | 875 | |
| H | 580 | 4.4% |
| F | 417 | 3.2% |
| Value | Count | Frequency (%) |
| A | 2310 | |
| C | 2161 | |
| P | 1979 | |
| E | 1599 | |
| O | 1277 | |
| S | 1244 | |
| T | 1179 | |
| M | 602 | 4.5% |
| H | 473 | 3.6% |
| F | 410 | 3.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| ? | 810 |
| Value | Count | Frequency (%) |
| ? | 765 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 157256 | |
| Common | 13376 | 7.8% |
| Value | Count | Frequency (%) |
| Latin | 157784 | |
| Common | 13449 | 7.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 18492 | |
| r | 17334 | |
| a | 16877 | |
| i | 12355 | 7.9% |
| c | 11161 | 7.1% |
| l | 9477 | 6.0% |
| s | 8707 | 5.5% |
| t | 7461 | 4.7% |
| n | 6877 | 4.4% |
| p | 6713 | 4.3% |
| Other values (20) | 41802 |
| Value | Count | Frequency (%) |
| r | 18533 | |
| e | 17354 | |
| a | 17294 | |
| i | 12654 | 8.0% |
| c | 11363 | 7.2% |
| l | 10190 | 6.5% |
| s | 7722 | 4.9% |
| t | 7230 | 4.6% |
| p | 6617 | 4.2% |
| n | 6345 | 4.0% |
| Other values (20) | 42482 |
Common
| Value | Count | Frequency (%) |
| - | 12566 | |
| ? | 810 | 6.1% |
| Value | Count | Frequency (%) |
| - | 12684 | |
| ? | 765 | 5.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 170632 |
| Value | Count | Frequency (%) |
| ASCII | 171233 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 18492 | 10.8% |
| r | 17334 | 10.2% |
| a | 16877 | 9.9% |
| - | 12566 | 7.4% |
| i | 12355 | 7.2% |
| c | 11161 | 6.5% |
| l | 9477 | 5.6% |
| s | 8707 | 5.1% |
| t | 7461 | 4.4% |
| n | 6877 | 4.0% |
| Other values (22) | 49325 |
| Value | Count | Frequency (%) |
| r | 18533 | 10.8% |
| e | 17354 | 10.1% |
| a | 17294 | 10.1% |
| - | 12684 | 7.4% |
| i | 12654 | 7.4% |
| c | 11363 | 6.6% |
| l | 10190 | 6.0% |
| s | 7722 | 4.5% |
| t | 7230 | 4.2% |
| p | 6617 | 3.9% |
| Other values (22) | 49592 |
relationship
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 6 | 8 |
| Distinct (%) | < 0.1% | 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| Husband | |
|---|---|
| Not-in-family | |
| Own-child | |
| Unmarried | |
| Wife |
| Husband | |
|---|---|
| Not-in-family | |
| Own-child | |
| Unmarried | |
| Wife | |
| Other values (3) | 304 |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 14 | 14 |
| Median length | 13 | 13 |
| Mean length | 9.1222857 | 9.1011429 |
| Min length | 4 | 2 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 127712 | 127416 |
| Distinct characters | 25 | 25 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 0 | 2 ? |
| Unique (%) | 0.0% | < 0.1% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | Unmarried | Husband |
| 2nd row | Not-in-family | Husband |
| 3rd row | Not-in-family | Husband |
| 4th row | Not-in-family | Husband |
| 5th row | Unmarried | Own-child |
Common Values
| Value | Count | Frequency (%) |
| Husband | 5627 | |
| Not-in-family | 3579 | |
| Own-child | 2253 | |
| Unmarried | 1433 | 10.2% |
| Wife | 689 | 4.9% |
| Other-relative | 419 | 3.0% |
| Value | Count | Frequency (%) |
| Husband | 5396 | |
| Not-in-family | 3646 | |
| Own-child | 2710 | |
| Unmarried | 1168 | 8.3% |
| Wife | 776 | 5.5% |
| Other-relative | 302 | 2.2% |
| ld | 1 | < 0.1% |
| Ownmarried | 1 | < 0.1% |
Length
Common Values (Plot)
Original Data
Synthetic Data
| Value | Count | Frequency (%) |
| husband | 5627 | |
| not-in-family | 3579 | |
| own-child | 2253 | |
| unmarried | 1433 | 10.2% |
| wife | 689 | 4.9% |
| other-relative | 419 | 3.0% |
| Value | Count | Frequency (%) |
| husband | 5396 | |
| not-in-family | 3646 | |
| own-child | 2710 | |
| unmarried | 1168 | 8.3% |
| wife | 776 | 5.5% |
| other-relative | 302 | 2.2% |
| ld | 1 | < 0.1% |
| ownmarried | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 12892 | 10.1% |
| i | 11952 | 9.4% |
| a | 11058 | 8.7% |
| - | 9830 | 7.7% |
| d | 9313 | 7.3% |
| l | 6251 | 4.9% |
| H | 5627 | 4.4% |
| u | 5627 | 4.4% |
| s | 5627 | 4.4% |
| b | 5627 | 4.4% |
| Other values (15) | 43908 |
| Value | Count | Frequency (%) |
| n | 12921 | 10.1% |
| i | 12249 | 9.6% |
| a | 10513 | 8.3% |
| - | 10304 | 8.1% |
| d | 9276 | 7.3% |
| l | 6659 | 5.2% |
| H | 5396 | 4.2% |
| s | 5396 | 4.2% |
| b | 5396 | 4.2% |
| u | 5396 | 4.2% |
| Other values (15) | 43910 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 103882 | |
| Uppercase Letter | 14000 | 11.0% |
| Dash Punctuation | 9830 | 7.7% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 103113 | |
| Uppercase Letter | 13999 | 11.0% |
| Dash Punctuation | 10304 | 8.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 12892 | |
| i | 11952 | |
| a | 11058 | |
| d | 9313 | 9.0% |
| l | 6251 | 6.0% |
| u | 5627 | 5.4% |
| s | 5627 | 5.4% |
| b | 5627 | 5.4% |
| m | 5012 | 4.8% |
| t | 4417 | 4.3% |
| Other values (9) | 26106 |
| Value | Count | Frequency (%) |
| n | 12921 | |
| i | 12249 | |
| a | 10513 | |
| d | 9276 | 9.0% |
| l | 6659 | 6.5% |
| s | 5396 | 5.2% |
| b | 5396 | 5.2% |
| u | 5396 | 5.2% |
| m | 4815 | 4.7% |
| f | 4422 | 4.3% |
| Other values (9) | 26070 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 9830 |
| Value | Count | Frequency (%) |
| - | 10304 |
Uppercase Letter
| Value | Count | Frequency (%) |
| H | 5627 | |
| N | 3579 | |
| O | 2672 | |
| U | 1433 | 10.2% |
| W | 689 | 4.9% |
| Value | Count | Frequency (%) |
| H | 5396 | |
| N | 3646 | |
| O | 3013 | |
| U | 1168 | 8.3% |
| W | 776 | 5.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 117882 | |
| Common | 9830 | 7.7% |
| Value | Count | Frequency (%) |
| Latin | 117112 | |
| Common | 10304 | 8.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 12892 | 10.9% |
| i | 11952 | 10.1% |
| a | 11058 | 9.4% |
| d | 9313 | 7.9% |
| l | 6251 | 5.3% |
| H | 5627 | 4.8% |
| u | 5627 | 4.8% |
| s | 5627 | 4.8% |
| b | 5627 | 4.8% |
| m | 5012 | 4.3% |
| Other values (14) | 38896 |
| Value | Count | Frequency (%) |
| n | 12921 | 11.0% |
| i | 12249 | 10.5% |
| a | 10513 | 9.0% |
| d | 9276 | 7.9% |
| l | 6659 | 5.7% |
| H | 5396 | 4.6% |
| s | 5396 | 4.6% |
| b | 5396 | 4.6% |
| u | 5396 | 4.6% |
| m | 4815 | 4.1% |
| Other values (14) | 39095 |
Common
| Value | Count | Frequency (%) |
| - | 9830 |
| Value | Count | Frequency (%) |
| - | 10304 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 127712 |
| Value | Count | Frequency (%) |
| ASCII | 127416 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 12892 | 10.1% |
| i | 11952 | 9.4% |
| a | 11058 | 8.7% |
| - | 9830 | 7.7% |
| d | 9313 | 7.3% |
| l | 6251 | 4.9% |
| H | 5627 | 4.4% |
| u | 5627 | 4.4% |
| s | 5627 | 4.4% |
| b | 5627 | 4.4% |
| Other values (15) | 43908 |
| Value | Count | Frequency (%) |
| n | 12921 | 10.1% |
| i | 12249 | 9.6% |
| a | 10513 | 8.3% |
| - | 10304 | 8.1% |
| d | 9276 | 7.3% |
| l | 6659 | 5.2% |
| H | 5396 | 4.2% |
| s | 5396 | 4.2% |
| b | 5396 | 4.2% |
| u | 5396 | 4.2% |
| Other values (15) | 43910 |
race
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 5 | 5 |
| Distinct (%) | < 0.1% | < 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| White | |
|---|---|
| Black | |
| Asian-Pac-Islander | 463 |
| Amer-Indian-Eskimo | 132 |
| Other | 114 |
| White | |
|---|---|
| Black | 738 |
| Asian-Pac-Islander | 330 |
| Amer-Indian-Eskimo | 113 |
| Other | 83 |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 5 | 5 |
| Mean length | 5.5525 | 5.4113571 |
| Min length | 5 | 5 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 77735 | 75759 |
| Distinct characters | 22 | 22 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | White | White |
| 2nd row | White | White |
| 3rd row | White | White |
| 4th row | White | White |
| 5th row | White | White |
Common Values
| Value | Count | Frequency (%) |
| White | 11935 | |
| Black | 1356 | 9.7% |
| Asian-Pac-Islander | 463 | 3.3% |
| Amer-Indian-Eskimo | 132 | 0.9% |
| Other | 114 | 0.8% |
| Value | Count | Frequency (%) |
| White | 12736 | |
| Black | 738 | 5.3% |
| Asian-Pac-Islander | 330 | 2.4% |
| Amer-Indian-Eskimo | 113 | 0.8% |
| Other | 83 | 0.6% |
Length
Common Values (Plot)
Original Data
Synthetic Data
| Value | Count | Frequency (%) |
| white | 11935 | |
| black | 1356 | 9.7% |
| asian-pac-islander | 463 | 3.3% |
| amer-indian-eskimo | 132 | 0.9% |
| other | 114 | 0.8% |
| Value | Count | Frequency (%) |
| white | 12736 | |
| black | 738 | 5.3% |
| asian-pac-islander | 330 | 2.4% |
| amer-indian-eskimo | 113 | 0.8% |
| other | 83 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 12662 | |
| e | 12644 | |
| t | 12049 | |
| h | 12049 | |
| W | 11935 | |
| a | 2877 | 3.7% |
| l | 1819 | 2.3% |
| c | 1819 | 2.3% |
| k | 1488 | 1.9% |
| B | 1356 | 1.7% |
| Other values (12) | 7037 |
| Value | Count | Frequency (%) |
| i | 13292 | |
| e | 13262 | |
| h | 12819 | |
| t | 12819 | |
| W | 12736 | |
| a | 1841 | 2.4% |
| l | 1068 | 1.4% |
| c | 1068 | 1.4% |
| - | 886 | 1.2% |
| n | 886 | 1.2% |
| Other values (12) | 5082 | 6.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 61355 | |
| Uppercase Letter | 15190 | 19.5% |
| Dash Punctuation | 1190 | 1.5% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 59987 | |
| Uppercase Letter | 14886 | 19.6% |
| Dash Punctuation | 886 | 1.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 12662 | |
| e | 12644 | |
| t | 12049 | |
| h | 12049 | |
| a | 2877 | 4.7% |
| l | 1819 | 3.0% |
| c | 1819 | 3.0% |
| k | 1488 | 2.4% |
| n | 1190 | 1.9% |
| s | 1058 | 1.7% |
| Other values (4) | 1700 | 2.8% |
| Value | Count | Frequency (%) |
| i | 13292 | |
| e | 13262 | |
| h | 12819 | |
| t | 12819 | |
| a | 1841 | 3.1% |
| l | 1068 | 1.8% |
| c | 1068 | 1.8% |
| n | 886 | 1.5% |
| k | 851 | 1.4% |
| s | 773 | 1.3% |
| Other values (4) | 1308 | 2.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| W | 11935 | |
| B | 1356 | 8.9% |
| A | 595 | 3.9% |
| I | 595 | 3.9% |
| P | 463 | 3.0% |
| E | 132 | 0.9% |
| O | 114 | 0.8% |
| Value | Count | Frequency (%) |
| W | 12736 | |
| B | 738 | 5.0% |
| A | 443 | 3.0% |
| I | 443 | 3.0% |
| P | 330 | 2.2% |
| E | 113 | 0.8% |
| O | 83 | 0.6% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1190 |
| Value | Count | Frequency (%) |
| - | 886 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 76545 | |
| Common | 1190 | 1.5% |
| Value | Count | Frequency (%) |
| Latin | 74873 | |
| Common | 886 | 1.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 12662 | |
| e | 12644 | |
| t | 12049 | |
| h | 12049 | |
| W | 11935 | |
| a | 2877 | 3.8% |
| l | 1819 | 2.4% |
| c | 1819 | 2.4% |
| k | 1488 | 1.9% |
| B | 1356 | 1.8% |
| Other values (11) | 5847 |
| Value | Count | Frequency (%) |
| i | 13292 | |
| e | 13262 | |
| h | 12819 | |
| t | 12819 | |
| W | 12736 | |
| a | 1841 | 2.5% |
| l | 1068 | 1.4% |
| c | 1068 | 1.4% |
| n | 886 | 1.2% |
| k | 851 | 1.1% |
| Other values (11) | 4231 | 5.7% |
Common
| Value | Count | Frequency (%) |
| - | 1190 |
| Value | Count | Frequency (%) |
| - | 886 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 77735 |
| Value | Count | Frequency (%) |
| ASCII | 75759 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 12662 | |
| e | 12644 | |
| t | 12049 | |
| h | 12049 | |
| W | 11935 | |
| a | 2877 | 3.7% |
| l | 1819 | 2.3% |
| c | 1819 | 2.3% |
| k | 1488 | 1.9% |
| B | 1356 | 1.7% |
| Other values (12) | 7037 |
| Value | Count | Frequency (%) |
| i | 13292 | |
| e | 13262 | |
| h | 12819 | |
| t | 12819 | |
| W | 12736 | |
| a | 1841 | 2.4% |
| l | 1068 | 1.4% |
| c | 1068 | 1.4% |
| - | 886 | 1.2% |
| n | 886 | 1.2% |
| Other values (12) | 5082 | 6.7% |
gender
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | < 0.1% | < 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| Male | |
|---|---|
| Female |
| Male | |
|---|---|
| Female |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.6611429 | 4.7067143 |
| Min length | 4 | 4 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 65256 | 65894 |
| Distinct characters | 6 | 6 |
| Distinct categories | 2 | 2 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | Male | Male |
| 2nd row | Male | Male |
| 3rd row | Male | Male |
| 4th row | Male | Male |
| 5th row | Male | Male |
Common Values
| Value | Count | Frequency (%) |
| Male | 9372 | |
| Female | 4628 |
| Value | Count | Frequency (%) |
| Male | 9053 | |
| Female | 4947 |
Length
Common Values (Plot)
Original Data
Synthetic Data
| Value | Count | Frequency (%) |
| male | 9372 | |
| female | 4628 |
| Value | Count | Frequency (%) |
| male | 9053 | |
| female | 4947 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 18628 | |
| a | 14000 | |
| l | 14000 | |
| M | 9372 | |
| F | 4628 | 7.1% |
| m | 4628 | 7.1% |
| Value | Count | Frequency (%) |
| e | 18947 | |
| a | 14000 | |
| l | 14000 | |
| M | 9053 | |
| F | 4947 | 7.5% |
| m | 4947 | 7.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 51256 | |
| Uppercase Letter | 14000 | 21.5% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 51894 | |
| Uppercase Letter | 14000 | 21.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 18628 | |
| a | 14000 | |
| l | 14000 | |
| m | 4628 | 9.0% |
| Value | Count | Frequency (%) |
| e | 18947 | |
| a | 14000 | |
| l | 14000 | |
| m | 4947 | 9.5% |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 9372 | |
| F | 4628 |
| Value | Count | Frequency (%) |
| M | 9053 | |
| F | 4947 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 65256 |
| Value | Count | Frequency (%) |
| Latin | 65894 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 18628 | |
| a | 14000 | |
| l | 14000 | |
| M | 9372 | |
| F | 4628 | 7.1% |
| m | 4628 | 7.1% |
| Value | Count | Frequency (%) |
| e | 18947 | |
| a | 14000 | |
| l | 14000 | |
| M | 9053 | |
| F | 4947 | 7.5% |
| m | 4947 | 7.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 65256 |
| Value | Count | Frequency (%) |
| ASCII | 65894 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 18628 | |
| a | 14000 | |
| l | 14000 | |
| M | 9372 | |
| F | 4628 | 7.1% |
| m | 4628 | 7.1% |
| Value | Count | Frequency (%) |
| e | 18947 | |
| a | 14000 | |
| l | 14000 | |
| M | 9053 | |
| F | 4947 | 7.5% |
| m | 4947 | 7.5% |
capital_gain
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 109 | 108 |
| Distinct (%) | 0.8% | 0.8% |
| Missing | 0 | 2 |
| Missing (%) | 0.0% | < 0.1% |
| Memory size | 109.5 KiB | 109.5 KiB |
| 0 | |
|---|---|
| 15024 | 161 |
| 7688 | 127 |
| 7298 | 108 |
| 99999 | 76 |
| Other values (104) | 717 |
| 0 | |
|---|---|
| 7298 | 231 |
| 15024 | 137 |
| 7688 | 112 |
| 5178 | 41 |
| Other values (103) | 398 |
Length
| Max length | 6 |
|---|---|
| Median length | 1 |
| Mean length | 1.210673 |
| Min length | 1 |
Characters and Unicode
| Total characters | 16947 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 19 | 51 ? |
| Unique (%) | 0.1% | 0.4% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 12811 | |
| 15024 | 161 | 1.1% |
| 7688 | 127 | 0.9% |
| 7298 | 108 | 0.8% |
| 99999 | 76 | 0.5% |
| 3103 | 46 | 0.3% |
| 5178 | 43 | 0.3% |
| 5013 | 33 | 0.2% |
| 4386 | 29 | 0.2% |
| 2174 | 27 | 0.2% |
| Other values (99) | 539 | 3.9% |
| Value | Count | Frequency (%) |
| 0 | 13079 | |
| 7298 | 231 | 1.7% |
| 15024 | 137 | 1.0% |
| 7688 | 112 | 0.8% |
| 5178 | 41 | 0.3% |
| 4650 | 33 | 0.2% |
| 99999 | 29 | 0.2% |
| 4386 | 28 | 0.2% |
| 5013 | 24 | 0.2% |
| 3103 | 17 | 0.1% |
| Other values (98) | 267 | 1.9% |
Length
Common Values (Plot)
Original Data
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Synthetic Data
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| 0 | 13080 | |
| 7298 | 231 | 1.7% |
| 15024 | 137 | 1.0% |
| 7688 | 112 | 0.8% |
| 5178 | 41 | 0.3% |
| 4650 | 33 | 0.2% |
| 99999 | 29 | 0.2% |
| 4386 | 28 | 0.2% |
| 5013 | 24 | 0.2% |
| 3103 | 17 | 0.1% |
| Other values (98) | 267 | 1.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 13397 | |
| 8 | 634 | 3.7% |
| 7 | 474 | 2.8% |
| 2 | 474 | 2.8% |
| 9 | 429 | 2.5% |
| 1 | 404 | 2.4% |
| 4 | 368 | 2.2% |
| 5 | 322 | 1.9% |
| 6 | 260 | 1.5% |
| 3 | 184 | 1.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16946 | |
| Space Separator | 1 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 13397 | |
| 8 | 634 | 3.7% |
| 7 | 474 | 2.8% |
| 2 | 474 | 2.8% |
| 9 | 429 | 2.5% |
| 1 | 404 | 2.4% |
| 4 | 368 | 2.2% |
| 5 | 322 | 1.9% |
| 6 | 260 | 1.5% |
| 3 | 184 | 1.1% |
Space Separator
| Value | Count | Frequency (%) |
| 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 16947 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 13397 | |
| 8 | 634 | 3.7% |
| 7 | 474 | 2.8% |
| 2 | 474 | 2.8% |
| 9 | 429 | 2.5% |
| 1 | 404 | 2.4% |
| 4 | 368 | 2.2% |
| 5 | 322 | 1.9% |
| 6 | 260 | 1.5% |
| 3 | 184 | 1.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16947 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 13397 | |
| 8 | 634 | 3.7% |
| 7 | 474 | 2.8% |
| 2 | 474 | 2.8% |
| 9 | 429 | 2.5% |
| 1 | 404 | 2.4% |
| 4 | 368 | 2.2% |
| 5 | 322 | 1.9% |
| 6 | 260 | 1.5% |
| 3 | 184 | 1.1% |
capital_loss
Real number (ℝ)
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 76 | 55 |
| Distinct (%) | 0.5% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 84.930214 | 46.901357 |
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 3900 | 25485 |
| Zeros | 13354 | 13659 |
| Zeros (%) | 95.4% | 97.6% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
Quantile statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 0 | 0 |
| Maximum | 3900 | 25485 |
| Range | 3900 | 25485 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Standard deviation | 394.66496 | 387.47855 |
| Coefficient of variation (CV) | 4.6469324 | 8.2615637 |
| Kurtosis | 20.340582 | 1676.9944 |
| Mean | 84.930214 | 46.901357 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 4.615084 | 30.0347 |
| Sum | 1189023 | 656619 |
| Variance | 155760.43 | 150139.63 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 13354 | |
| 1902 | 82 | 0.6% |
| 1887 | 66 | 0.5% |
| 1977 | 61 | 0.4% |
| 1485 | 26 | 0.2% |
| 1590 | 23 | 0.2% |
| 1740 | 22 | 0.2% |
| 2415 | 22 | 0.2% |
| 1602 | 21 | 0.1% |
| 1876 | 20 | 0.1% |
| Other values (66) | 303 | 2.2% |
| Value | Count | Frequency (%) |
| 0 | 13659 | |
| 1902 | 64 | 0.5% |
| 1977 | 57 | 0.4% |
| 1485 | 24 | 0.2% |
| 1672 | 23 | 0.2% |
| 1887 | 22 | 0.2% |
| 1876 | 17 | 0.1% |
| 2377 | 11 | 0.1% |
| 1590 | 11 | 0.1% |
| 2002 | 9 | 0.1% |
| Other values (45) | 103 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 13354 | |
| 213 | 1 | < 0.1% |
| 323 | 2 | < 0.1% |
| 419 | 2 | < 0.1% |
| 625 | 9 | 0.1% |
| 653 | 1 | < 0.1% |
| 810 | 2 | < 0.1% |
| 880 | 3 | < 0.1% |
| 974 | 1 | < 0.1% |
| 1092 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 13659 | |
| 2 | 1 | < 0.1% |
| 176 | 1 | < 0.1% |
| 180 | 1 | < 0.1% |
| 200 | 1 | < 0.1% |
| 204 | 1 | < 0.1% |
| 625 | 4 | < 0.1% |
| 810 | 2 | < 0.1% |
| 880 | 1 | < 0.1% |
| 1051 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 13659 | |
| 2 | 1 | < 0.1% |
| 176 | 1 | < 0.1% |
| 180 | 1 | < 0.1% |
| 200 | 1 | < 0.1% |
| 204 | 1 | < 0.1% |
| 625 | 4 | < 0.1% |
| 810 | 2 | < 0.1% |
| 880 | 1 | < 0.1% |
| 1051 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 13354 | |
| 213 | 1 | < 0.1% |
| 323 | 2 | < 0.1% |
| 419 | 2 | < 0.1% |
| 625 | 9 | 0.1% |
| 653 | 1 | < 0.1% |
| 810 | 2 | < 0.1% |
| 880 | 3 | < 0.1% |
| 974 | 1 | < 0.1% |
| 1092 | 4 | < 0.1% |
hours_per_week
Real number (ℝ)
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 90 | 82 |
| Distinct (%) | 0.6% | 0.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 40.240929 | 39.731357 |
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 99 | 762 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
Quantile statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 16 | 16 |
| Q1 | 40 | 40 |
| median | 40 | 40 |
| Q3 | 45 | 43 |
| 95-th percentile | 60 | 60 |
| Maximum | 99 | 762 |
| Range | 98 | 761 |
| Interquartile range (IQR) | 5 | 3 |
Descriptive statistics
| Original Data | Synthetic Data | |
|---|---|---|
| Standard deviation | 12.368062 | 15.4285 |
| Coefficient of variation (CV) | 0.30735031 | 0.38832049 |
| Kurtosis | 2.9757174 | 767.0054 |
| Mean | 40.240929 | 39.731357 |
| Median Absolute Deviation (MAD) | 4 | 1.5 |
| Skewness | 0.23608827 | 17.54265 |
| Sum | 563373 | 556239 |
| Variance | 152.96895 | 238.03862 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 40 | 6534 | |
| 50 | 1173 | 8.4% |
| 45 | 794 | 5.7% |
| 60 | 615 | 4.4% |
| 35 | 563 | 4.0% |
| 20 | 522 | 3.7% |
| 30 | 519 | 3.7% |
| 25 | 309 | 2.2% |
| 55 | 283 | 2.0% |
| 48 | 212 | 1.5% |
| Other values (80) | 2476 | 17.7% |
| Value | Count | Frequency (%) |
| 40 | 6976 | |
| 50 | 1136 | 8.1% |
| 45 | 683 | 4.9% |
| 20 | 603 | 4.3% |
| 30 | 588 | 4.2% |
| 60 | 525 | 3.8% |
| 35 | 454 | 3.2% |
| 25 | 285 | 2.0% |
| 55 | 219 | 1.6% |
| 15 | 213 | 1.5% |
| Other values (72) | 2318 | 16.6% |
| Value | Count | Frequency (%) |
| 1 | 10 | 0.1% |
| 2 | 16 | 0.1% |
| 3 | 21 | 0.1% |
| 4 | 22 | 0.2% |
| 5 | 24 | 0.2% |
| 6 | 28 | 0.2% |
| 7 | 11 | 0.1% |
| 8 | 62 | |
| 9 | 5 | < 0.1% |
| 10 | 124 |
| Value | Count | Frequency (%) |
| 1 | 10 | 0.1% |
| 2 | 12 | 0.1% |
| 3 | 13 | 0.1% |
| 4 | 3 | < 0.1% |
| 5 | 6 | < 0.1% |
| 6 | 37 | 0.3% |
| 7 | 3 | < 0.1% |
| 8 | 75 | |
| 10 | 184 | |
| 11 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 10 | 0.1% |
| 2 | 12 | 0.1% |
| 3 | 13 | 0.1% |
| 4 | 3 | < 0.1% |
| 5 | 6 | < 0.1% |
| 6 | 37 | 0.3% |
| 7 | 3 | < 0.1% |
| 8 | 75 | |
| 10 | 184 | |
| 11 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 10 | 0.1% |
| 2 | 16 | 0.1% |
| 3 | 21 | 0.1% |
| 4 | 22 | 0.2% |
| 5 | 24 | 0.2% |
| 6 | 28 | 0.2% |
| 7 | 11 | 0.1% |
| 8 | 62 | |
| 9 | 5 | < 0.1% |
| 10 | 124 |
native_country
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 42 | 48 |
| Distinct (%) | 0.3% | 0.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| United-States | |
|---|---|
| Mexico | 274 |
| ? | 269 |
| Philippines | 86 |
| Germany | 71 |
| Other values (37) | 768 |
| United-States | |
|---|---|
| Mexico | 304 |
| ? | 199 |
| El-Salvador | 85 |
| Germany | 68 |
| Other values (43) | 660 |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 26 | 26 |
| Median length | 13 | 13 |
| Mean length | 12.276857 | 12.372929 |
| Min length | 1 | 1 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 171876 | 173221 |
| Distinct characters | 45 | 45 |
| Distinct categories | 6 | 6 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 1 | 9 ? |
| Unique (%) | < 0.1% | 0.1% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | United-States | Taiwan |
| 2nd row | United-States | Japan |
| 3rd row | United-States | South |
| 4th row | United-States | Italy |
| 5th row | United-States | Mexico |
Common Values
| Value | Count | Frequency (%) |
| United-States | 12532 | |
| Mexico | 274 | 2.0% |
| ? | 269 | 1.9% |
| Philippines | 86 | 0.6% |
| Germany | 71 | 0.5% |
| Canada | 56 | 0.4% |
| El-Salvador | 48 | 0.3% |
| India | 44 | 0.3% |
| Puerto-Rico | 42 | 0.3% |
| England | 40 | 0.3% |
| Other values (32) | 538 | 3.8% |
| Value | Count | Frequency (%) |
| United-States | 12684 | |
| Mexico | 304 | 2.2% |
| ? | 199 | 1.4% |
| El-Salvador | 85 | 0.6% |
| Germany | 68 | 0.5% |
| Italy | 65 | 0.5% |
| China | 64 | 0.5% |
| Canada | 61 | 0.4% |
| Columbia | 45 | 0.3% |
| Vietnam | 41 | 0.3% |
| Other values (38) | 384 | 2.7% |
Length
Common Values (Plot)
Original Data
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Synthetic Data
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| united-states | 12532 | |
| mexico | 274 | 2.0% |
| 269 | 1.9% | |
| philippines | 86 | 0.6% |
| germany | 71 | 0.5% |
| canada | 56 | 0.4% |
| el-salvador | 48 | 0.3% |
| india | 44 | 0.3% |
| puerto-rico | 42 | 0.3% |
| england | 40 | 0.3% |
| Other values (32) | 538 | 3.8% |
| Value | Count | Frequency (%) |
| united-states | 12684 | |
| mexico | 304 | 2.2% |
| 199 | 1.4% | |
| el-salvador | 85 | 0.6% |
| germany | 68 | 0.5% |
| italy | 65 | 0.5% |
| china | 64 | 0.5% |
| canada | 61 | 0.4% |
| columbia | 45 | 0.3% |
| vietnam | 41 | 0.3% |
| Other values (38) | 384 | 2.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 37803 | |
| e | 25720 | |
| a | 13639 | 7.9% |
| i | 13446 | 7.8% |
| n | 13141 | 7.6% |
| d | 12814 | 7.5% |
| - | 12663 | 7.4% |
| s | 12641 | 7.4% |
| S | 12626 | 7.3% |
| U | 12540 | 7.3% |
| Other values (35) | 4843 | 2.8% |
| Value | Count | Frequency (%) |
| t | 38309 | |
| e | 25968 | |
| a | 13777 | 8.0% |
| i | 13484 | 7.8% |
| n | 13181 | 7.6% |
| d | 12919 | 7.5% |
| - | 12843 | 7.4% |
| S | 12811 | 7.4% |
| s | 12749 | 7.4% |
| U | 12694 | 7.3% |
| Other values (35) | 4486 | 2.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 132510 | |
| Uppercase Letter | 26418 | 15.4% |
| Dash Punctuation | 12663 | 7.4% |
| Other Punctuation | 277 | 0.2% |
| Open Punctuation | 4 | < 0.1% |
| Close Punctuation | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 133526 | |
| Uppercase Letter | 26649 | 15.4% |
| Dash Punctuation | 12843 | 7.4% |
| Other Punctuation | 201 | 0.1% |
| Open Punctuation | 1 | < 0.1% |
| Close Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| t | 37803 | |
| e | 25720 | |
| a | 13639 | 10.3% |
| i | 13446 | 10.1% |
| n | 13141 | 9.9% |
| d | 12814 | 9.7% |
| s | 12641 | 9.5% |
| o | 606 | 0.5% |
| c | 469 | 0.4% |
| l | 407 | 0.3% |
| Other values (11) | 1824 | 1.4% |
| Value | Count | Frequency (%) |
| t | 38309 | |
| e | 25968 | |
| a | 13777 | 10.3% |
| i | 13484 | 10.1% |
| n | 13181 | 9.9% |
| d | 12919 | 9.7% |
| s | 12749 | 9.5% |
| o | 594 | 0.4% |
| l | 448 | 0.3% |
| c | 447 | 0.3% |
| Other values (11) | 1650 | 1.2% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 12663 |
| Value | Count | Frequency (%) |
| - | 12843 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 12626 | |
| U | 12540 | |
| M | 274 | 1.0% |
| P | 182 | 0.7% |
| C | 156 | 0.6% |
| G | 121 | 0.5% |
| I | 108 | 0.4% |
| E | 101 | 0.4% |
| R | 70 | 0.3% |
| J | 57 | 0.2% |
| Other values (9) | 183 | 0.7% |
| Value | Count | Frequency (%) |
| S | 12811 | |
| U | 12694 | |
| M | 304 | 1.1% |
| C | 195 | 0.7% |
| E | 118 | 0.4% |
| G | 102 | 0.4% |
| I | 82 | 0.3% |
| P | 80 | 0.3% |
| R | 56 | 0.2% |
| V | 42 | 0.2% |
| Other values (9) | 165 | 0.6% |
Other Punctuation
| Value | Count | Frequency (%) |
| ? | 269 | |
| & | 8 | 2.9% |
| Value | Count | Frequency (%) |
| ? | 199 | |
| & | 2 | 1.0% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 4 |
| Value | Count | Frequency (%) |
| ( | 1 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 4 |
| Value | Count | Frequency (%) |
| ) | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 158928 | |
| Common | 12948 | 7.5% |
| Value | Count | Frequency (%) |
| Latin | 160175 | |
| Common | 13046 | 7.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| t | 37803 | |
| e | 25720 | |
| a | 13639 | 8.6% |
| i | 13446 | 8.5% |
| n | 13141 | 8.3% |
| d | 12814 | 8.1% |
| s | 12641 | 8.0% |
| S | 12626 | 7.9% |
| U | 12540 | 7.9% |
| o | 606 | 0.4% |
| Other values (30) | 3952 | 2.5% |
| Value | Count | Frequency (%) |
| t | 38309 | |
| e | 25968 | |
| a | 13777 | 8.6% |
| i | 13484 | 8.4% |
| n | 13181 | 8.2% |
| d | 12919 | 8.1% |
| S | 12811 | 8.0% |
| s | 12749 | 8.0% |
| U | 12694 | 7.9% |
| o | 594 | 0.4% |
| Other values (30) | 3689 | 2.3% |
Common
| Value | Count | Frequency (%) |
| - | 12663 | |
| ? | 269 | 2.1% |
| & | 8 | 0.1% |
| ( | 4 | < 0.1% |
| ) | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| - | 12843 | |
| ? | 199 | 1.5% |
| & | 2 | < 0.1% |
| ( | 1 | < 0.1% |
| ) | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 171876 |
| Value | Count | Frequency (%) |
| ASCII | 173221 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| t | 37803 | |
| e | 25720 | |
| a | 13639 | 7.9% |
| i | 13446 | 7.8% |
| n | 13141 | 7.6% |
| d | 12814 | 7.5% |
| - | 12663 | 7.4% |
| s | 12641 | 7.4% |
| S | 12626 | 7.3% |
| U | 12540 | 7.3% |
| Other values (35) | 4843 | 2.8% |
| Value | Count | Frequency (%) |
| t | 38309 | |
| e | 25968 | |
| a | 13777 | 8.0% |
| i | 13484 | 7.8% |
| n | 13181 | 7.6% |
| d | 12919 | 7.5% |
| - | 12843 | 7.4% |
| S | 12811 | 7.4% |
| s | 12749 | 7.4% |
| U | 12694 | 7.3% |
| Other values (35) | 4486 | 2.6% |
income_bracket
Categorical
| Original Data | Synthetic Data | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | < 0.1% | < 0.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 109.5 KiB | 109.5 KiB |
| <=50K | |
|---|---|
| >50K |
| <=50K | |
|---|---|
| >50K |
Length
| Original Data | Synthetic Data | |
|---|---|---|
| Max length | 5 | 5 |
| Median length | 5 | 5 |
| Mean length | 4.7614286 | 4.8197857 |
| Min length | 4 | 4 |
Characters and Unicode
| Original Data | Synthetic Data | |
|---|---|---|
| Total characters | 66660 | 67477 |
| Distinct characters | 6 | 6 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Original Data | Synthetic Data | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Original Data | Synthetic Data | |
|---|---|---|
| 1st row | <=50K | >50K |
| 2nd row | >50K | <=50K |
| 3rd row | <=50K | <=50K |
| 4th row | <=50K | <=50K |
| 5th row | <=50K | <=50K |
Common Values
| Value | Count | Frequency (%) |
| <=50K | 10660 | |
| >50K | 3340 | 23.9% |
| Value | Count | Frequency (%) |
| <=50K | 11477 | |
| >50K | 2523 | 18.0% |
Length
Common Values (Plot)
Original Data
Synthetic Data
| Value | Count | Frequency (%) |
| 50k | 14000 |
| Value | Count | Frequency (%) |
| 50k | 14000 |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 14000 | |
| 0 | 14000 | |
| K | 14000 | |
| < | 10660 | |
| = | 10660 | |
| > | 3340 | 5.0% |
| Value | Count | Frequency (%) |
| 5 | 14000 | |
| 0 | 14000 | |
| K | 14000 | |
| < | 11477 | |
| = | 11477 | |
| > | 2523 | 3.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 28000 | |
| Math Symbol | 24660 | |
| Uppercase Letter | 14000 |
| Value | Count | Frequency (%) |
| Decimal Number | 28000 | |
| Math Symbol | 25477 | |
| Uppercase Letter | 14000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 14000 | |
| 0 | 14000 |
| Value | Count | Frequency (%) |
| 5 | 14000 | |
| 0 | 14000 |
Uppercase Letter
| Value | Count | Frequency (%) |
| K | 14000 |
| Value | Count | Frequency (%) |
| K | 14000 |
Math Symbol
| Value | Count | Frequency (%) |
| < | 10660 | |
| = | 10660 | |
| > | 3340 | 13.5% |
| Value | Count | Frequency (%) |
| < | 11477 | |
| = | 11477 | |
| > | 2523 | 9.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 52660 | |
| Latin | 14000 | 21.0% |
| Value | Count | Frequency (%) |
| Common | 53477 | |
| Latin | 14000 | 20.7% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 5 | 14000 | |
| 0 | 14000 | |
| < | 10660 | |
| = | 10660 | |
| > | 3340 | 6.3% |
| Value | Count | Frequency (%) |
| 5 | 14000 | |
| 0 | 14000 | |
| < | 11477 | |
| = | 11477 | |
| > | 2523 | 4.7% |
Latin
| Value | Count | Frequency (%) |
| K | 14000 |
| Value | Count | Frequency (%) |
| K | 14000 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 66660 |
| Value | Count | Frequency (%) |
| ASCII | 67477 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 14000 | |
| 0 | 14000 | |
| K | 14000 | |
| < | 10660 | |
| = | 10660 | |
| > | 3340 | 5.0% |
| Value | Count | Frequency (%) |
| 5 | 14000 | |
| 0 | 14000 | |
| K | 14000 | |
| < | 11477 | |
| = | 11477 | |
| > | 2523 | 3.7% |
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
Synthetic Data
Interaction plot not present for dataset
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
| age | fnlwgt | education_num | capital_gain | capital_loss | hours_per_week | workclass | education | marital_status | occupation | relationship | race | gender | native_country | income_bracket | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age | 1.000 | -0.081 | 0.061 | 0.132 | 0.052 | 0.152 | 0.124 | 0.119 | 0.281 | 0.123 | 0.279 | 0.022 | 0.128 | 0.032 | 0.321 |
| fnlwgt | -0.081 | 1.000 | -0.037 | -0.016 | -0.008 | -0.033 | 0.018 | 0.019 | 0.016 | 0.010 | 0.021 | 0.072 | 0.041 | 0.045 | 0.001 |
| education_num | 0.061 | -0.037 | 1.000 | 0.126 | 0.075 | 0.167 | 0.100 | 1.000 | 0.083 | 0.224 | 0.111 | 0.062 | 0.073 | 0.142 | 0.365 |
| capital_gain | 0.132 | -0.016 | 0.126 | 1.000 | -0.067 | 0.092 | 0.043 | 0.159 | 0.040 | 0.070 | 0.049 | 0.000 | 0.058 | 0.000 | 0.269 |
| capital_loss | 0.052 | -0.008 | 0.075 | -0.067 | 1.000 | 0.051 | 0.021 | 0.045 | 0.036 | 0.030 | 0.051 | 0.000 | 0.053 | 0.000 | 0.141 |
| hours_per_week | 0.152 | -0.033 | 0.167 | 0.092 | 0.051 | 1.000 | 0.123 | 0.089 | 0.117 | 0.143 | 0.161 | 0.054 | 0.238 | 0.022 | 0.267 |
| workclass | 0.124 | 0.018 | 0.100 | 0.043 | 0.021 | 0.123 | 1.000 | 0.109 | 0.083 | 0.427 | 0.097 | 0.053 | 0.140 | 0.028 | 0.170 |
| education | 0.119 | 0.019 | 1.000 | 0.159 | 0.045 | 0.089 | 0.109 | 1.000 | 0.095 | 0.187 | 0.122 | 0.070 | 0.091 | 0.134 | 0.371 |
| marital_status | 0.281 | 0.016 | 0.083 | 0.040 | 0.036 | 0.117 | 0.083 | 0.095 | 1.000 | 0.132 | 0.489 | 0.083 | 0.452 | 0.063 | 0.449 |
| occupation | 0.123 | 0.010 | 0.224 | 0.070 | 0.030 | 0.143 | 0.427 | 0.187 | 0.132 | 1.000 | 0.179 | 0.076 | 0.423 | 0.062 | 0.350 |
| relationship | 0.279 | 0.021 | 0.111 | 0.049 | 0.051 | 0.161 | 0.097 | 0.122 | 0.489 | 0.179 | 1.000 | 0.099 | 0.648 | 0.079 | 0.455 |
| race | 0.022 | 0.072 | 0.062 | 0.000 | 0.000 | 0.054 | 0.053 | 0.070 | 0.083 | 0.076 | 0.099 | 1.000 | 0.107 | 0.389 | 0.102 |
| gender | 0.128 | 0.041 | 0.073 | 0.058 | 0.053 | 0.238 | 0.140 | 0.091 | 0.452 | 0.423 | 0.648 | 0.107 | 1.000 | 0.042 | 0.213 |
| native_country | 0.032 | 0.045 | 0.142 | 0.000 | 0.000 | 0.022 | 0.028 | 0.134 | 0.063 | 0.062 | 0.079 | 0.389 | 0.042 | 1.000 | 0.087 |
| income_bracket | 0.321 | 0.001 | 0.365 | 0.269 | 0.141 | 0.267 | 0.170 | 0.371 | 0.449 | 0.350 | 0.455 | 0.102 | 0.213 | 0.087 | 1.000 |
Synthetic Data
| age | fnlwgt | education_num | capital_loss | hours_per_week | workclass | education | marital_status | occupation | relationship | race | gender | native_country | income_bracket | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age | 1.000 | -0.047 | 0.031 | -0.009 | 0.004 | 0.122 | 0.119 | 0.000 | 0.000 | 0.007 | 0.000 | 0.000 | 0.000 | 0.005 |
| fnlwgt | -0.047 | 1.000 | -0.028 | 0.004 | -0.007 | 0.006 | 0.000 | 0.000 | 0.015 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| education_num | 0.031 | -0.028 | 1.000 | -0.031 | -0.006 | 0.086 | 1.000 | 0.000 | 0.019 | 0.000 | 0.000 | 0.000 | 0.052 | 0.080 |
| capital_loss | -0.009 | 0.004 | -0.031 | 1.000 | 0.058 | 0.016 | 0.036 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.042 | 0.000 |
| hours_per_week | 0.004 | -0.007 | -0.006 | 0.058 | 1.000 | 0.026 | 0.000 | 0.000 | 0.000 | 0.044 | 0.000 | 0.022 | 0.000 | 0.020 |
| workclass | 0.122 | 0.006 | 0.086 | 0.016 | 0.026 | 1.000 | 0.095 | 0.000 | 0.018 | 0.015 | 0.014 | 0.000 | 0.025 | 0.076 |
| education | 0.119 | 0.000 | 1.000 | 0.036 | 0.000 | 0.095 | 1.000 | 0.008 | 0.013 | 0.012 | 0.015 | 0.000 | 0.048 | 0.116 |
| marital_status | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.008 | 1.000 | 0.111 | 0.412 | 0.032 | 0.415 | 0.000 | 0.139 |
| occupation | 0.000 | 0.015 | 0.019 | 0.000 | 0.000 | 0.018 | 0.013 | 0.111 | 1.000 | 0.159 | 0.013 | 0.232 | 0.017 | 0.091 |
| relationship | 0.007 | 0.000 | 0.000 | 0.000 | 0.044 | 0.015 | 0.012 | 0.412 | 0.159 | 1.000 | 0.081 | 0.637 | 0.000 | 0.157 |
| race | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.014 | 0.015 | 0.032 | 0.013 | 0.081 | 1.000 | 0.085 | 0.021 | 0.000 |
| gender | 0.000 | 0.000 | 0.000 | 0.000 | 0.022 | 0.000 | 0.000 | 0.415 | 0.232 | 0.637 | 0.085 | 1.000 | 0.035 | 0.061 |
| native_country | 0.000 | 0.000 | 0.052 | 0.042 | 0.000 | 0.025 | 0.048 | 0.000 | 0.017 | 0.000 | 0.021 | 0.035 | 1.000 | 0.086 |
| income_bracket | 0.005 | 0.000 | 0.080 | 0.000 | 0.020 | 0.076 | 0.116 | 0.139 | 0.091 | 0.157 | 0.000 | 0.061 | 0.086 | 1.000 |
Original Data
Synthetic Data
Original Data
Synthetic Data
Original Data
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | gender | capital_gain | capital_loss | hours_per_week | native_country | income_bracket | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 30 | ? | 157289 | 11th | 7 | Never-married | ? | Unmarried | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 1 | 33 | Private | 170769 | Doctorate | 16 | Divorced | Sales | Not-in-family | White | Male | 99999 | 0 | 60 | United-States | >50K |
| 2 | 37 | Private | 279029 | Bachelors | 13 | Never-married | Craft-repair | Not-in-family | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 3 | 30 | Private | 255004 | Assoc-acdm | 12 | Divorced | Sales | Not-in-family | White | Male | 0 | 0 | 52 | United-States | <=50K |
| 4 | 24 | ? | 144898 | Some-college | 10 | Never-married | ? | Unmarried | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 5 | 59 | Private | 61885 | 12th | 8 | Divorced | Transport-moving | Other-relative | Black | Male | 0 | 0 | 35 | United-States | <=50K |
| 6 | 53 | Private | 96062 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 40 | Greece | <=50K |
| 7 | 18 | Private | 208103 | 11th | 7 | Never-married | Other-service | Other-relative | White | Male | 0 | 0 | 25 | United-States | <=50K |
| 8 | 30 | Private | 190823 | Some-college | 10 | Never-married | Other-service | Own-child | Black | Female | 0 | 0 | 40 | United-States | <=50K |
| 9 | 22 | ? | 424494 | Some-college | 10 | Never-married | ? | Own-child | White | Male | 0 | 0 | 25 | United-States | <=50K |
Synthetic Data
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | gender | capital_gain | capital_loss | hours_per_week | native_country | income_bracket | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 18 | ? | 97318.0 | 11th | 7 | Married-civ-spouse | ? | Husband | White | Male | 0 | 0 | 15 | Taiwan | >50K |
| 1 | 24 | Private | 227070.0 | 10th | 6 | Married-civ-spouse | ? | Husband | White | Male | 0 | 0 | 40 | Japan | <=50K |
| 2 | 51 | Private | 124187.0 | 9th | 5 | Married-civ-spouse | ? | Husband | White | Male | 0 | 0 | 30 | South | <=50K |
| 3 | 28 | Private | 128509.0 | 5th-6th | 3 | Married-civ-spouse | Sales | Husband | White | Male | 0 | 2444 | 40 | Italy | <=50K |
| 4 | 48 | Private | 33155.0 | HS-grad | 9 | Never-married | ? | Own-child | White | Male | 0 | 0 | 38 | Mexico | <=50K |
| 5 | 61 | Private | 119986.0 | Masters | 14 | Divorced | Other-service | Unmarried | White | Female | 0 | 0 | 30 | Philippines | >50K |
| 6 | 43 | Private | 456236.0 | Masters | 14 | Married-civ-spouse | Other-service | Husband | White | Male | 0 | 0 | 60 | United-States | >50K |
| 7 | 56 | Private | 104945.0 | 7th-8th | 4 | Divorced | Prof-specialty | Unmarried | Other | Female | 0 | 0 | 60 | United-States | >50K |
| 8 | 23 | Private | 143582.0 | HS-grad | 9 | Never-married | Sales | Own-child | White | Male | 0 | 0 | 45 | United-States | >50K |
| 9 | 34 | Private | 405284.0 | Bachelors | 13 | Married-civ-spouse | Protective-serv | Husband | White | Male | 0 | 0 | 40 | Philippines | <=50K |
Original Data
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | gender | capital_gain | capital_loss | hours_per_week | native_country | income_bracket | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13990 | 40 | Self-emp-not-inc | 98985 | HS-grad | 9 | Divorced | Exec-managerial | Not-in-family | Black | Male | 0 | 0 | 50 | United-States | <=50K |
| 13991 | 53 | Private | 68684 | HS-grad | 9 | Married-civ-spouse | Transport-moving | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 13992 | 17 | Private | 365613 | 10th | 6 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 10 | Canada | <=50K |
| 13993 | 50 | Private | 155594 | Assoc-acdm | 12 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | 0 | 0 | 50 | United-States | >50K |
| 13994 | 27 | Private | 69757 | Some-college | 10 | Never-married | Adm-clerical | Not-in-family | White | Female | 0 | 0 | 60 | United-States | <=50K |
| 13995 | 39 | Local-gov | 178100 | Masters | 14 | Divorced | Prof-specialty | Unmarried | White | Female | 0 | 0 | 40 | United-States | <=50K |
| 13996 | 40 | Private | 226608 | Some-college | 10 | Divorced | Tech-support | Not-in-family | White | Male | 0 | 0 | 30 | Guatemala | >50K |
| 13997 | 37 | Private | 295127 | HS-grad | 9 | Never-married | Machine-op-inspct | Not-in-family | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 13998 | 46 | Private | 149640 | 7th-8th | 4 | Married-spouse-absent | Transport-moving | Not-in-family | White | Male | 0 | 0 | 45 | United-States | <=50K |
| 13999 | 57 | Private | 109015 | Some-college | 10 | Divorced | Sales | Not-in-family | White | Female | 0 | 0 | 48 | United-States | <=50K |
Synthetic Data
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | gender | capital_gain | capital_loss | hours_per_week | native_country | income_bracket | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13990 | 42 | Private | 194772.0 | Prof-school | 15 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | 0 | 0 | 30 | United-States | <=50K |
| 13991 | 28 | State-gov | 132551.0 | Bachelors | 13 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 13992 | 53 | Local-gov | 34173.0 | HS-grad | 9 | Married-civ-spouse | Transport-moving | Husband | White | Male | 0 | 0 | 8 | United-States | <=50K |
| 13993 | 38 | Local-gov | 209103.0 | Bachelors | 13 | Married-civ-spouse | Adm-clerical | Husband | White | Male | 0 | 0 | 10 | United-States | <=50K |
| 13994 | 46 | Private | 48885.0 | HS-grad | 9 | Married-civ-spouse | Transport-moving | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K |
| 13995 | 44 | Private | 193882.0 | HS-grad | 9 | Divorced | ? | Unmarried | White | Female | 2202 | 0 | 40 | United-States | <=50K |
| 13996 | 40 | Private | 99185.0 | Assoc-voc | 11 | Divorced | Sales | Unmarried | White | Female | 0 | 0 | 40 | ? | <=50K |
| 13997 | 41 | Private | 121130.0 | HS-grad | 9 | Married-civ-spouse | Sales | Husband | White | Male | 0 | 0 | 44 | Canada | <=50K |
| 13998 | 47 | Private | 201734.0 | Assoc-voc | 11 | Married-civ-spouse | Sales | Husband | White | Male | 0 | 0 | 40 | Mexico | <=50K |
| 13999 | 34 | Private | 34848.0 | Some-college | 10 | Married-civ-spouse | Sales | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K |
Original Data
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | gender | capital_gain | capital_loss | hours_per_week | native_country | income_bracket | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 25 | Private | 195994 | 1st-4th | 2 | Never-married | Priv-house-serv | Not-in-family | White | Female | 0 | 0 | 40 | Guatemala | <=50K | 3 |
| 0 | 21 | Private | 250051 | Some-college | 10 | Never-married | Prof-specialty | Own-child | White | Female | 0 | 0 | 10 | United-States | <=50K | 2 |
| 1 | 23 | Private | 240137 | 5th-6th | 3 | Never-married | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 55 | Mexico | <=50K | 2 |
| 3 | 25 | Private | 308144 | Bachelors | 13 | Never-married | Craft-repair | Not-in-family | White | Male | 0 | 0 | 40 | Mexico | <=50K | 2 |
| 4 | 27 | Private | 255582 | HS-grad | 9 | Never-married | Machine-op-inspct | Not-in-family | White | Female | 0 | 0 | 40 | United-States | <=50K | 2 |
| 5 | 28 | Private | 274679 | Masters | 14 | Never-married | Prof-specialty | Not-in-family | White | Male | 0 | 0 | 50 | United-States | <=50K | 2 |
| 6 | 42 | Private | 204235 | Some-college | 10 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0 | 0 | 40 | United-States | >50K | 2 |
Synthetic Data
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | gender | capital_gain | capital_loss | hours_per_week | native_country | income_bracket | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 19 | ? | 124651.0 | 11th | 7 | Married-civ-spouse | Prof-specialty | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K | 2 |
| 1 | 30 | Private | 196396.0 | Some-college | 10 | Divorced | Sales | Not-in-family | White | Female | 0 | 0 | 45 | United-States | <=50K | 2 |
| 2 | 33 | Private | 206609.0 | Bachelors | 13 | Married-civ-spouse | Sales | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K | 2 |
| 3 | 45 | Private | 266860.0 | Masters | 14 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K | 2 |